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^ | 1 Abstract 

We propose a new Riemannian geometry for fixed-rank matrices that is specifically 
tailored to the low-rank matrix completion problem. Exploiting the degree of freedom of a 
quotient space, we tune the metric on our search space to the particular least square cost 
function. At one level, it illustrates in a novel way how to exploit the versatile framework 
of optimization on quotient manifold. At another level, our algorithms can be considered 
as improved versions of LMaFit, the state-of-the-art Gauss-Seidel algorithm. We develop 
necessary tools needed to perform both first-order and second-order optimization. In 
particular, we propose gradient descent schemes (steepest descent and conjugate gradient) 
and trust-region algorithms. We also show that, thanks to the simplicity of the cost 
■ function, it is numerically cheap to perform an exact linesearch given a search direction, 

which makes our algorithms competitive with the state-of-the-art on standard low-rank 
matrix completion instances. 
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1 Introduction 

The problem of low-rank matrix completion amounts to completing a matrix from a small 
number of entries by assuming a low-rank model for the matrix. This problem has been 
addressed both from the theoretical |CR08|, IGroll] as well as algorithmic viewpoints [RS05, 
ICCS08} lLB09l IMJD091 IKMQ 10} ISElOl IJMD101 iMHTTOl lBATT| IMBS1 1\ IMMBSIH IMMBS12] . 
A standard way of approaching the problem is by casting the low-rank matrix completion 
problem as a fixed-rank optimization problem with the assumption that the optimal rank r 
is known a priori as shown below 

min XeR nxm |^||Pn(X) -Pn(X)||| 
subject to rank(X) = r, 
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where X G M. nxm is a matrix whose entries known for indices if they belong to the subset 
G fi, where H is a subset of the complete set of indices ■ i G {1, ...,n} and j G 

{1, ...,m}}. The operator VnO^ij) = Xy if G and VnO^ij) = otherwise is called 
the orthogonal sampling operator and is a mathematically convenient way to represent the 
subset of entries. The objective function is, therefore, a mean least square objective function 
where || • \\p is Frobenius norm with \Q\ is the cardinality of the set (equal to the number 
of known entries). The search space R™ xm is the space of r — rank matrices of size n x m , 
with r <C min{m, n}. 

The rank constraint correlates the known with the unknown entries. The number of given 
entries |0| is typically much smaller than nm, the total number of entries in X. Recent 
contributions provide conditions on \Q\ under which exact reconstruction is possible from 
entries sampled uniformly and at random [CR081 ICCS081 IKMO10] . 

A popular way to tackle the rank-constraint in (TIJ is by using a factorization model. In this 
paper, we pursue our research on the geometry of factorization models. It builds upon earlier 
works |Meyll IMBS1H [MMBS12] which describe factorization models and optimization on 



the resulting search space. The geometries described in |Meyll [MMBS12J only deal with 



quotient structure that results from the symmetries of factorization models. Here we move 
a step forward by tuning the metric to the cost function. The resulting geometry reduces to 
the state-of-the-art algorithm LMaFit [WYZlOj special case of our optimization scheme. 
We develop tools necessary to propose efficient first-order and second-order optimization 
algorithms with the new geometry. Our simulations suggest that the proposed algorithms 
scale to large dimensions and compete favorably with the state-of-the-art. 

At the time of submission of this paper, the authors were alerted (by Google Scholar) 
of the preprint |NS12| . At the conceptual level, the two papers present the same idea, each 
with its own twist. The twist of the present contribution is to emphasize that the new metric 
is one particular selection among Riemannian quotient geometries of low-rank matrices. It 
underlines the value of a geometric framework that parameterizes the degrees of freedom of 
the metric (the optimization of which is problem dependent) while providing a general theory 
of convergence and algorithms (the collection of which is problem independent). 

2 Exploiting the problem structure 

Efficient algorithms depend on properly exploiting both the structure of the constraints and 
the structure of the cost function. A natural way to exploit the rank constraint is through ma- 
trix factorization, which leads to a quotient structure of the search space |Meyll , |MMBS12j. 



We review one of the geometries below. The structure of the cost function is then exploited 
to select one particular metric on the quotient manifold. 

The quotient nature of matrix factorization 

A r — rank matrix X G R™ xm is factorized as 

X = GH T (2) 
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where G £ M™ xr and H £ R™ xr are full column-rank matrices. Such a factorization is not 
unique as X remains unchanged by scaling the factors 

(G, H) i — y (GM-\ HM T ), (3) 

for any non-singular matrix M £ GL(r), the set of r x r non-singular matrices and we have 
X = GH T = GM 1 (HM T ) T [PQ99] . 

The classical remedy to remove this indeterminacy is Cholesky factorization, which re- 
quires further (triangular-like) structure in the factors. LU decomposition is a way forward 
|GVL96j . In contrast, we encode the invariance map ([3]) in an abstract search space by 
optimizing over a set of equivalence classes defined as 

[(G,H)] = {(GM- 1 ,HM T ) : M £ GL(r)}. (4) 

The set of equivalence classes is termed as the quotient space of A4 r by GL(r) and is denoted 
as 

M r ■= M r /GL(r), (5) 

where the total space M r is the product space M™ xr x M™ xr . 

To do optimization on the abstract search space M r we need to select a metric on the 
total space A4 r such that the quotient search space is a Riemannian submersion, Section 6.3.2 
of |AMS08| . The metric should make the inner product between tangent vectors invariant 
along the equivalence class [(G,H)]. This is the only constraint imposed by the geometry of 
the search space. 

Scaling 

A further selection constraint for the metric is to look at the Hessian of the objective function. 
The Newton algorithm is a scaled gradient descent algorithm, in which the search space is 
endowed with a Riemannian metric induced by the Hessian of the cost function [Nes03, 
INW06| . In many cases, using the full Hessian information is computationally costly. A 
popular compromise between convergence and numerical efficiency is to scale the gradient by 
the diagonal elements of the Hessian [NW06J. If we vectorize (column-on-column) G — > g 
and H — > h and stack them on another like 

9 
h 

then the full Hessian will be a symmetric {n+m)r x (n+m)r matrix but not necessarily positive 
definite as the objective function is non-convex in the variables g and h together. However, 
the diagonal elements are going to be strictly positive because, fixing h, the objective function 
is strictly convex in g and vice versa. 

The full Hessian for the matrix completion function is given in [BF05j where the au- 
thors deal with non-positive definiteness of the Hessian by shifting the eigenvalues using the 
Levenberg-Marquardt optimization method [NW06] . However, the resulting optimization al- 
gorithm is numerically costly and unsuitable for large dimensions |Che08| . Hence, we avoid 
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using the full Hessian information and instead concentrate on the diagonal elements of the 
Hessian of the matrix approximation function 

||X-GH T ||| (6) 

Minimizing this function with respect to G and H amounts to learning the dominant r — rank 
subspace of X [GVL96J . The diagonal elements of the Hessian of the matrix approximation 
cost function (which number (n + m)r) have a simple form. Let 

Hess 9 
Hess/j 

be the diagonal approximation of the full Hessian of the matrix approximation function where 
Hess 9 is a diagonal matrix size nr x nr and Hess^ is a diagonal matrix of size mr x mr. 
Scaling the gradients (with respect to vectorized formulation) with the diagonal elements of 
the Hessian [NW06] would then mean following transformation 

~§g * Hess 3 1 ^ 

JL v Hess -1 

dh f niibb h dh 

where € R rar and G W nr are the Euclidean partial derivatives in vectorized form. In 
matrix form (just rearranging the terms) and substituting Hess 5 and Hess/j appropriately, we 
have the following equivalent transformation 



& — ► (^)[diag(H r H)] -1 

J_^ ( J_) [diag(G T G)] -l 



(7) 



where ^ and are the Euclidean partial derivatives with respect to matrices G and H. 
diag(H T H) is a diagonal matrix extracting the diagonal of H r H. This seems to be a suitable 
scaling for a gradient descent algorithm for minimizing the matrix approximation problem 



This also seems to be a reasonable scaling for algorithms directed towards the matrix 
completion problem (pQ). This is because, when we complete a low-rank matrix, we implicitly 
try to learn the dominant subspace, by learning on a smaller set of known entries. 

A new metric on the tangent space 

The tangent space of the total (product) space M. r has the following characterizations due 
to the product structure T s M. r at x = (G, H) G M. r has the expression 

TjA r = R nxr x M mxr . 

To choose a proper metric on the total space A4 r we need to take into account the symmetry 
([3]) imposed by the factorization model and scaling, similar to (|7|). The new metric, we propose 
is 

9s&,fk) = Tr((H r H)^f/ G ) + Tr((G T G)^fy H ) (8) 
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where x = (G,H) and £ x ,ri x £ T x M. r . Note that this is not the right-invariant metric 
introduced in [MBSllJ which is 

9*&,fk) = Tr((G T G)- 1 ^%) + Tr((H r H)- 1 ^ H ). (9) 

Both these metrics lead to complete metric spaces and both satisfy the symmetry condition ([5]) 
but they differ in the scaling. The right-invariant metric is motivated purely by the geometry 
of the product space. It is not tuned to a particular cost function. On the other hand, our 
choice is motivated by the geometry as well as the particular least square cost function of the 
matrix completion problem. With the right-invariant metric and our new metric, the scaling 
of the gradients are characterized as 

New metric: & -> (^(H^H)-*, & -> (^(G^G)" 1 

Right - invariant metric : ^ — ► (^)(G T G), — ► ( Jj)(H T H) 

The scaling obtained by our metric in some way inverse to that by the right-invariant metric. 
Consider the case when G r G = H r H (balancing update as called in |Meyll| ), the scaling is 
indeed inverse. In fact comparing (j5J) and ([9]) suggests that the right-invariant metric (|9|) is 
not well suited to the matrix completion, explaining the poor performance often observed in 
simulation (in particular instances as shown later in Section [5.11 ). The proposed new metric 
dH|) tries to take into account the scaling, as much as possible. 



3 Optimization related ingredients 

Once the metric (JSJ is chosen on the total space M r , the discussion relating the quotient 
space M r with the new metric follows directly from |AMS08| . To perform optimization on 
the search space we need the notions of the following concepts. These can be directly derived 
by following the steps in [AMS08]. 

Notions of optimization on the quotient space 

For quotient manifolds M. r = M. r / ~ a tangent vector £ x 6 T x M. r at x = [x] is restricted to 
the directions that do not induce a displacement along the set of equivalence classes. This 
is achieved by decomposing the tangent space in the total space T x A4 r into complementary 
spaces 

TxM r = VsMr © Ux~M r . 

Refer Figure [TJ for a graphical illustration. The vertical space V x A4 r is the set of directions 
that contains tangent vectors to the equivalence classes. The horizontal space ~H x Ai r is the 
complement of V x A4 r in T x Ai r . The horizontal space provides a representation of the abstract 
tangent vectors to the quotient space, i.e., T X M. T . Indeed, displacements in the vertical space 
leave the matrix X (matrix representation of the point x) unchanged, which suggests to 
restrict both tangent vectors and metric to the horizontal space. Once T x A4 r is endowed 
with a horizontal distribution 7i x M r , a given tangent vector £ x € T x A4 r at x in the quotient 
manifold M. r is uniquely represented by a tangent vector £ s £ T-L x Ad r in the total space Ai r . 
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• x = [x] 

Figure 1: The search space is the abstract quotient space A4 r . The points y and x in A4 r 
belonging to the same equivalence class are represented by a single point [x] in the quotient 
space M. r . 

The tangent vector £ s £ H x A4 r is called the horizontal lift of £ x at x. Provided that the 
metric defined in the total space is invariant along the set of equivalence classes. A metric 
9x(^x,Cx) in the total space defines a metric g x on the quotient manifold. Namely, 

9x{ixXx) ■= 9x{ixXx) (11) 

where £ x and £ x are the tangent vectors in T x A4 r and £g and Q x are their horizontal lifts in 
H X W. 

Natural displacements at x in a direction £ 2 € on the manifold are performed by 

following geodesies (paths of shortest length on the manifold) starting from x and tangent to 
£x- This is performed by means of the exponential map 

xt+i = Exp S( (s t ^ t ), 

which induces a line-search algorithm along geodesies with the step size St- However, the 
geodesies are generally expensive to compute and, in many cases, are not available in closed- 
form. A more general update formula is obtained if we relax the constraint of moving along 
geodesies. The retraction mapping R Xt (st£, Xt ) at xt locally approximates the exponential map- 
ping [AMS08]. It provides a numerically attractive alternative to the exponential mapping 
in the design of optimization algorithms on manifolds, as it reduces the computational com- 
plexity of the update while retaining the essential properties that ensure convergence results. 
A generic abstract line-search algorithm is, thus, based on the update formula 

x t+1 = R Xt {stix t ) (12) 

where st is the step-size. A good step-size is computed using the Armijo rule jNW06 . 

Similarly, second-order algorithms on the quotient manifold W are horizontally lifted and 
solved in the total space W. Additionally, we need to be able to compute the directional 



7 




r 



Figure 2: A line-search on a Riemannian quotient manifold A4 r in the total space Ai r . 
Conceptually, we move on the quotient manifold M. T from xt to xt+i but computationally in 
the total space M. r . The search direction is £g t and belongs to the horizontal space %x t M. r at 
point x t - The retraction mapping maintains feasibility of the iterates. The blue line denotes 
the equivalence class [xt]. 



derivative of gradient along a search direction. This relationship is captured by an affine 
connection V on the manifold. The vector field V,j£ implies the covariant derivative of vector 
field rj with respect to the vector field £. In the case of W being a Euclidean space, the affine 
connection is standard 

(V £?? ) =lim %+ ^ ~ Vx . 

v iUx t^o t 

However, for an arbitrary manifold there exists infinitely many different affine connections 
except for a specific connection called the Levi-Civita or Riemannian connection which is 
always unique. The properties of affine connections and Riemannian connection are in Section 
5.2 and Theorem 5.3.1 of [AMS08J. The Riemannian connection on the quotient manifold W 
is given in terms of the connection in the total space W once the quotient manifold has the 
structure of a Riemannian submersion [AMS08J. 

Horizontal space and projection operator 

A matrix representation of the tangent space of the abstract search space M r at the equiva- 
lence class x = [x] € W relies on the decomposition of T s JV[ r into its vertical and horizontal 
subspaces. The tangent space is decomposed into the vertical and horizontal space which 
have the following expressions 

VxM r = {(— GA, HA T ) | AeR rxr } and 

n s M r = {{CG,Cn) I G t CgH t H = G T GC£H, ( G eR nxr X n eR mxr } M '" 

where ( s £ T s M r and for any A G R rxr . 

Apart from the characterization of the Horizontal space, we need a mapping Tl s ■ T s A4 r i— > 
1-Lx-Mr that maps vectors from the tangent space to the horizontal space [ABG07J. Projecting 
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an element rj x G T x M. r onto the horizontal space is accomplished with the operator 

n.fe)= (f/ G + GA,r?H-HA T ) (14) 

where A 6 R rxr is uniquely obtained by noticing that II 5 (r^) belongs to the horizontal space 
described in ([TBI and hence, we have 

G r fe + GA)H T H = G t G(t7h-HA t ) T H 

2G T GAH T H = G T G^H - G T 7/ G H T H (15) 

=> A = 0.5 [r^H^H)- 1 - (G T G)- 1 G T r ?G ] . 

which has a closed form expression. 



Retraction 

A retraction is a mapping that maps vectors in the horizontal space to points on the search 
space M r and satisfies certain conditions (Definition 4.1.1 in [AMS08]). We choose a simple 
retraction |AMS08j 

Rg(vg) = G + fj G 



(16) 

^h^h) = H + r/ H 



where rj x € T-L x M. r - 



Vector transport 

A vector transport T Vx (, x on a manifold M. r is a smooth mapping that transports the vector £ x 
at a; to a vector in the tangent space at x + rj x satisfying certain conditions. Refer Definition 
8.1.1 in [AMS08]. When the total space is an open subset of the Euclidean space which is our 
case, the vector transport is given in terms of the projection operator (j!4|) and is computed 
as in Section 8.1.4 in |AMS08j 

Exact linesearch 

Because of the simplicity of the cost function and simple retraction used, an exact linesearch is 
numerically feasible. Given a search direction fj £ T~L x A4 r at a point x = (G,H), the optimal 
step-size t* is computed by solving the minimization problem 

r = argmin t ||P n (X) - (G + ^ G )(H + t Vn ) T \\ 2 F . (17) 

Note that this is a degree 4 polynomial in t. The optima are the roots of the first derivative 
which is a degree 3 polynomial. Efficient algorithms exist (including closed-form expressions) 
for finding the roots of a degree 3 polynomial and hence, finding the optimal t* is numerically 
straight forward. 

Note also that for a gradient descent algorithm, computing t* is still numerically more 
costly than using an approximate step-size procedure like adaptive linesearch in [NW06, 
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MMBS12]. Hence, we do not compute the optimal step-size at each iteration for our gradient 
descent implementation. We use it only in the very beginning to guess a right step-size 
order and then use our adaptive step-size procedure described in [MMBS12J for subsequent 
iterations. 

However, an optimal step-size has a significant effect on a conjugate gradient algorithm 
[NW06J. The extra cost of computing the optimal step-size is compensated by a faster rate 
of convergence. 

Riemannian connection 

For the trust-region scheme we additionally need the notion of Riemannian connection in the 
total space A4 r on the search space M r . The Riemannian connection generalizes the idea of 
directional derivative of a vector field on the manifold. In particular, we are interested in the 
notion on the quotient space. The horizontal lift of the Riemannian connection V^^, the 
directional derivative of the vector field £ x € T x Ai r in the direction r/ x € T x A4 r , is given in 
terms of the Riemannian connection on the product space M r , 

%j~ x = n 2 (v w fx) (is) 

which is the horizontal projection of the Riemannian connection onto the horizontal space, 
Proposition 5.3.3 in [AMS08]. It now remains to find out the Riemannian connection on the 
total space M r . We find the expression by invoking the Koszul formula, Theorem 5.3.1 in 
[AMS08J. After a routine calculation, the final expression 

= D|[^] + (Ag,A h ), where 
A G = e G Sym(^H)(H T H)- 1 +f? G Sym(eHH)(H T H)- 1 -GSym(eHr/ H )(H T H)- 1 
A H = ? H Sym(r/ G G)(G T G)- 1 +r? H Sym(e G G)(G T G)- 1 -HSym(e G r/ G )(G T G)- 1 

(19) 

and D£[t7] is the classical Euclidean directional derivative and Sym extracts the symmetric 
part, Sym(Z) = z ~ t 2 z where Z is a square matrix. 



4 Algorithms 

Gradient descent schemes 

In the previous section, we have developed all the necessary ingredients to propose gradient 
descent schemes. If (G,H) is the current iterate, we take a simultaneous step in both the 
variables in the negative Riemannian gradient direction with the new metric ([8]) (G,H), 
i.e., (SH(H T H)-\ S T G(G T G)" 1 ) where S = ^ (V^(GH T ) - P n (X)) is the Euclidean 
gradient in the space M. nxm . Note that ^q- = SH (obtained by chain rule) and the scaled 
version is post multiplied by (H T H) _1 . Similarly, = S T G and the scaled version is post 
multiplied by (G T G) _1 . The update rule, thus, is 

G+ = G — tSH(H T H) _1 

H+ = H-tS T G(G T G)- 1 (20) 
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with the step-size t > 0. In our implementation, we perform an Armijo linesearch with an 
adaptive step-size procedure [MMBS12] to update t . 

Similarly, a geometric conjugate gradient (CG) can also be described. We implement 
Algorithm 13 of [AMS08] with the tools developed here. Once a conjugate gradient direction 
is obtained, the optimal step-size is then computed by solving (fTTl) . 

Trust-region scheme 

In a trust-region scheme we first build a locally second-order model of the objective function 
in a neighborhood and minimize the model function to obtain a candidate iterated. This is 
called the trust-region subproblem. Depending on the obtained and the desired decrease, the 
neighborhood is modified [NW06J. 

Analogous to trust-region algorithms in the Euclidean space, trust-region algorithms on a 
quotient manifold with guaranteed quadratic rate convergence have been proposed in [ABG07, 
IAMS 08], In particular, the trust-region subproblem on the abstract space Ai r is horizontally 
lifted to Hx-Mr and formulated as 

min f?e-H^ r + 9x(v,grad(l)(x)) + lg x (fj,Kess<l>(x)[r)]) 

subject to gx(ViV) < s - 

where 5 is the trust-region radius and grad</> and Hess(/> are the horizontal lifts of the Rieman- 
nian gradient and its directional derivative (covariant derivative in the direction ry denoted by 
Hess^>[r/]) of the least square objective function c/> on A4 r . Here (f> x = pr |?sj(GH T )- Pn(X)|||, 
at x = (G,H). 

The covariant derivative Hess0[r/] is computed using the relations (I18D and (I19p 

Hess</>[ry] = V^gradc/) = IIj (V^grad^) . 

The trust-region subproblem is solved efficiently using the generic solver GenRTR [B AG07] . 
Refer |ABG071IBTG07] for the algorithmic details. The output is a direction r\ that minimizes 
the model. We initialize the trust-region radius S 

5 = t ||grad^(x ) ll^ir ( 22 ) 

where to > is the optimal stepsize in the negative grad0(xo) direction at the initial iterate 
xo ^ -Mr- The exact line-search procedure has been described earlier. The overall bound on 
the trust-region radii, 5 is fixed at 2 10 <5o (relatively a large number, we allow 10 updates). For 
subsequent iterates 5 is updated as in Algorithm 4.1 in [NW06 . 

Connection with LMaFit 

The LMaFit algorithm [WYZlOj for the low-rank matrix completion problem has been a 
popular benchmark owing to simpler updates of iterates and tuned step-size updates in turn 
leading to a superior time per iteration complexity. The algorithm relies on the factorization 
X = GH T to alternatively learn the matrices X, G and H so that the error ||X — GH T ||^ 
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is minimized while ensuring that the entries of X agree with the known entries i.e., 'Pq(X) = 
7-*q(X). The algorithm is a tuned version of the block-coordinate descent algorithm in the 
product space that has a superior computational cost per iteration and better convergence 
than the straight forward non- linear Gauss-Seidel scheme (GS algorithm). We show here both 
the non-linear GS and LMaFit updates. 

The GS algorithm is an alternating minimization scheme, when each variable is updated 
first and used in the update of the other variables. However, instead of sequential updates of 
the variables, a simultaneous update of the variables has the following form. Given an iterate 
(G,H), simultaneously updating the variables to (G + ,H + ) 

G+ = G-SH(H T H) 1 

H+ = H-S T G(G T G)- 1 (23) 



where S = py \ Vn(GH T ) — Vn(X.)J is the Euclidean gradient of the objective function ([T]) 
in the space M. nxm . As mentioned before, LMaFit is a tuned version of the GS updates (|23p . 
the updates are written as 

G+ = (1 - u)G + w(G - SH^H)- 1 ) 

H+ = (l-w)H + w(H-S T G(G T G)- 1 ) 1 ' 

where the weight oj > 1 is updated in an efficient way [WYZ10 . The connection between 
(simultaneous version of) LMaFit and our gradient descent algorithm is straight forward by 
looking at the similarity between (|2Up and (|24|) . LMaFit is a particular case of our geometric 
gradient descent algorithm ([20]) . 



5 Numerical simulations 

We compare our algorithms with those from the right-invariant geometry [MMBS12], LMaFit 
|WYZ10] and LRGeom [Vanll] , Unlike the right-invariant geometry and LMaFit, LRGeom 
is based on the embedded geometry which is different from the geometries based on a fac- 
torization model. This view simplifies some notions of the geometric objects that can be 
interpreted in a straight forward way. In the matrix completion case, this allows to compute 
the initial guess for the Armijo line-search efficiently |Vanllj . 

For the gradient descent implementations of our geometry and the right-invariant geom- 
etry we use the Armijo lineseach procedure with the adaptive step-size update proposed in 
[MMBS12]. For the gradient descent implementation of LRGeom, we use their step-size guess 
at each iteration. 

For the CG implementations, we use the robust P — R + update for the parameter (5 
|NW06| . We use an exact lineseach procedure for both our new geometry and the right- 
invariant geometry. For the CG implementation of LRGeom, we use their step-size guess for 
each iteration. 

For trust-region algorithms, we also compare with algorithms based on our new geometry, 
the right-invariant geometry in [MMBS12] and the embedded geometry in [Van ll], The trust- 
region subproblem is solved using the GenRTR package |BAG07j and the trust -region radius 
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for the algorithms based on the new geometry and the right-invariant geometry are initialized 
as in (122]) . 

All simulations are performed in MATLAB on a 2.53 GHz Intel Core i5 machine with 4 
GB of RAM. We use MATLAB codes of the competing algorithms for our numerical studies. 
The codes are available from the respective authors' webpages. For each example, a n x m 
random matrix of rank r is generated according to a Gaussian distribution with zero mean 
and unit standard deviation and a fraction of the entries are randomly removed with uniform 
probability. The dimensions of n x m matrices of rank r is (ra+m— r)r. The over-sampling (OS) 
ratio determines the number of entries that are known. A OS = 6 means that 6(n + m — r)r 
number of randomly and uniformly selected entries are known a priori out of ran entries. 
Numerical codes for the proposed algorithms for the new geometry are available from the 
first author's homepage. 

Three instances are considered where we demonstrate the efficacy of our framework. We 
randomly initialize all the competing algorithms. The algorithms are terminated once the 
cost, mll'Po(X) — Vni^Wp is below 10~ 20 or the number of iterations exceeds 500. 

5.1 Comparison with gradient schemes 
A large-rank instance 

Consider a case when n = m = 1000 and r = 50 with OS = 5, which implies that we know 
49% of the entries (which is a big number). In this case, one would expect LMaFit to perform 
better than both gradient descent and CG algorithms primarily because the parameter uj is 
properly tuned. As more and more entries are known, the completion problem ([1]) approaches 
the matrix approximation problem ([6]) problem. One would also expect the right-invariant 
geometry of [MBSllj to perform poorly on this instance with random initialization as it 
neglects scaling that is required (jlOp 0. In fact, the right-invariant geometry inversely scales 
the gradient (flO)) . The plots are shown in Figure [3j 

With an initialization proposed in [KMOIO] , the poor performance of the right-invariant algorithms di- 
minishes. 
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Figure 3: LMaFit performs extremely well when a large number of entries are known. The 
instance considered here involves n = m = 1000 and r = 50 completion with OS = 5. The 
step-size parameter of LMaFit is properly tuned for these instances. Top: Gradient descent 
algorithms. Bottom: Conjugate gradient algorithms. The right-invariant geometry performs 
poorly with random initialization. 
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Figure 4: Competitive performance our geometry both for CG and gradient descent schemes. 
n = m = 10000 and r = 5 with OS = 5. LMaFit performs poorly for these instances as the 
parameter tuning is not proper. Top: Gradient descent algorithms. Bottom: Conjugate 
gradient algorithms. 

Consider the case n = m = 10000 and r = 5 with OS = 5, which implies that we know 
0.5% of the entries (which is a small number). The plots are shown in Figure HI LMaFit 
in this case suffers from poor convergence because of the poor tuning of the parameter uj. 
All the gradient descent algorithms have a similar rate of convergence however, timing- wise 
the gradient descent implementations of our new geometry and right-invariant geometry have 
better timing than the gradient descent implementation of LRGeom, suggesting that the 
adaptive linesearch is competitive in a gradient descent setting. CG algorithms based on our 
new geometry outperform others. 

Consider a still bigger example of n = m = 32000 and r = 10 with OS = 3. In this 
case we know 0.12% of the entries (which is a very small number). The performance of 
different algorithms are shown in Figure [SJ Our CG scheme convincingly outperforms other 
CG schemes. The gradient descent schemes our geometry and the right-invariant geometry 
seem to perform similarly. 
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Figure 5: Competitive performance our geometry on a problem instance, n = m = 32000 and 
r = 10 with OS = 3. Our CG scheme outperforms others. Top: Gradient descent algorithms. 
Bottom: Conjugate gradient algorithms. 



5.2 Comparison with trust-region algorithms 

We consider a large scale instance of size 32000 x 32000 of rank 10. The number of entries are 
uniformly revealed with OS = 5. All the algorithms are initialized similarly as in [K MO10] . 
by taking the dominant SVD of the sparse incomplete matrbj§. 

The plots in Figure show a favorable performance of our trust-region scheme. As ex- 
pected, our TR algorithm is superior to that of the right-invariant geometry. With respect 
to the embedded approach, the performance of the both the algorithms are similar both 
in the number of outer (as well as inner iterations for the trust-region subproblem) and in 
computational burden per iteration. 

2 The TR algorithms do not seem to compete favorably with the conjugate gradient algorithms for a number 
of instances when initialized randomly. 
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Figure 6: n = m = 32000 and r = 10 with OS = 5. Our trust-region scheme is clearly superior 
to that of the right-invariant geometry. Also our trust-region algorithm competes favorably 
with that of the embedded geometry in a number of situations. 



6 Conclusion 

We have described a new geometry for the set of fixed-rank matrices. The geometry builds on 
the framework of quotient manifold but the new metric (JSj) additionally exploits the particular 
cost function of the low-rank matrix completion problem. The proposed algorithms connect 
to the state-of-the-art LMaFit algorithm, which is viewed as a tuned (step-size) algorithm in 
our framework. We have developed all the necessary ingredients to perform first-order and 
second-order optimization with the new geometry. We have shown that an exact linesearch 
is numerically feasible with our choice of updating low-rank factors. The resulting class of 
algorithms compete favorably with the state-of-the-art in a number of problem instances. 
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